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Abstract 

We define the min-min expectation selection problem (resp. max-min expecta- 
tion selection prohlem) to be that of selecting k out of n given discrete probabihty 
distributions, to minimize (resp. maximize) the expectation of the minimum value 
resulting when independent random variables are drawn from the selected distribu- 
tions. We assume each distribution has finitely many atoms. Let d be the number 
of distinct values in the support of the distributions. We show that if d is a constant 
greater than 2, the min-min expectation problem is NP-complete but admits a fully 
polynomial time approximation scheme. For d an arbitrary integer, it is NP-hard 
to approximate the min-min expectation problem with any constant approximation 
factor. The max-min expectation problem is polynomially solvable for constant d; 
we leave open its complexity for variable d. We also show similar results for binary 
selection problems in which we must choose one distribution from each of n pairs of 
distributions. 



1 Introduction 

Suppose we are given an integer n and the distributions of n independent random variables 
Yi, for 1 < i < n. We will let Iq < li < ■ ■ ■ < /^-i denote the possible values assumed by 
each of the random variables, and assume these are specified in the input. We assume that 
each Yi is then specified in the input by giving, for each j G 1,2, . . . ,d — 1, the value of 
Pr {Yi > Ij}. (Note that Pr {Yi > Iq} need not be specified since it will always be 1.) For 
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complexity purposes, we assume that all integer values in the input are given in binary, 
and all other values are specified as ratios of integers. Let N be the total number of bits 
required to specify the input. 

Suppose we wish to choose k of the random variables so as to minimize the expected 
value of the minimum of the selected variables. More formally, if we wish to choose a 
subset S* C {1, 2, . . . , n} with 15*1 = /c so as to minimize 



min Yj 

i&S 



(1) 



we call this the min-min expectation subset selection problem. We call the variation in 
which we wish to maximize (P the max-niin expectation subset selection problem. 
When we consider approximation results, we will assume 

/o > 0. (2) 

(Note that without this assumption we could use any approximation ratio bound to solve 
the decision problem, by translating all of the k appropriately.) 

We also consider the problem of maximizing (resp. minimizing) the expected value of the 
maximum of the selected variables, which we call the max-max and min-max problems. By 
a simple change of sign of all of the k, we see that the exact optimization forms of the max- 
max and min-min problems, and of the min-max and max-min problems, are equivalent in 
difficulty. (Note that this equivalence does not carry over to the approximation forms of 
the problems, since the sign change would cause a violation of (^.) 

To motivate the min-min problem, suppose you are a user of a peer-to-peer file sharing 
service such as Gnutella. After performing a search, you have located several servers 
hosting copies of a file that you urgently need to download. If bandwidth at your end of 
the network is not a limiting factor, you may be able to speed your download by requesting 
downloads from more than one server, and stopping when the first of these downloads 
reaches completion. Suppose you have the capacity for k simultaneous requests, and you 
have information from each server such as its connection type and echo time from which 
you can estimate a distribution on its download times. Which k servers do you choose in 
order to minimize the expected time until you have downloaded a complete copy of your 
file? 

To motivate the min-max problem, consider an editor who wishes to select k referees 
out of n qualified candidates so as to process an article as quickly as possible. Assume that 
the editor can estimate the time used by each referee as a random variable, and assume 
that the times used by each referee are independent. Then, assuming the editor will wait 
until all referee reports are received, s/he wishes to choose a set of referees that minimizes 
the maximum of their times. 

Call the variations in which we are given 2n independent random variables Yi^s, for 
1 < i < n and < s < 1, and asked to choose a function x : {1, 2, . . . , n} — > {0, 1} so as 
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to optimize 

E[mmyi,x(i)J, (3) 

the min-min expectation binary selection problem or the max-min expectation binary se- 
lection problem. 

We consider the complexity and approximability of these problems both for the case 
in which d is fixed and the case in which d is a parameter given in the input. Our results 
hold for either the binary selection or subset selection form of the problem. In Section ^ 
we show that the variation of the min-min problem in which d is fixed is NP-complete 
(assuming d > 3), but admits a fully polynomial time approximation scheme. (See [§[ for 
definitions.) Curiously, the max-min problem with fixed d can be solved in polynomial 
time. In Section ^ we show that when d is variable, the min-min problem cannot be 
approximated to within any fixed ratio in polynomial time unless P = NP; we leave open 
the complexity of the max-min problem with variable d. 

The following formula, which follows from summation by parts, will be useful. Suppose 
that a random variable X assumes only the values . . . ,1^- Then 

Em=^o + E(/,-/,-i)Pr{X>/,}. (4) 



2 Fixed d 

In this section we consider the case in which d is fixed. It will be convenient to consider 
scaled negative logarithms of the probabilities: we describe a given random variable X by 
a vector L = {Li, L2, . . . , i^d-i) where Lj = —7"^ In Pr {X > Ij} , so that 

Ft {X > Ij} = e-'^^K (5) 

(Since Pr{X > Iq} will always be one, we need not specify the value of Lq.) Here 7 is a 
positive value to be chosen later. Note that the expectation of such a random variable is 
given by the function 

/(L)=/o + E(/,-/,-l)e-^^^ (6) 

and that this function / is convex, i.e., for a G [0, 1] and arbitrary L and L', we have 

f{aL + (1 - a)L') < af{L) + (1 - a)/(L'). 

Note also that if L and L' specify the distributions of two random variables, then the 
distribution of their minimum is specified by L + L'. 
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Assume we are dealing with the binary selection problem and let Pi^s be the vector of 
scaled negative logarithms corresponding to the random variable Yi^g, i-e, the jth compo- 



nent Lj of Pi^s satisfies 



Pr{n, >/,} = e-^^^ 
Then given a selection x : {0, 1, . . . , n — 1} ^ {0, 1}, we have 



minFi,^(i) 



II, 

/(EAxw)- (7) 

i=l 



2.1 NP-completeness of the min-min problem 

The min-min problem is hard even when the all random variables must assume values 
chosen from a set of size three. 

Theorem 1 The min-min expectation binary selection problem is NP-complete for any 
fixed d> 3. 

Proof. Membership in NP is apparent. To prove completeness we perform a polynomial 
transformation from the subset-sum problem. Suppose we are given a set {zi, . . . , Zn} of 
nonnegative integers and asked whether the sum of some subset is equal to a given integer T. 
Let M be the maximum of the zf, we will assume that T < uM since otherwise the problem 
is trivial. Wc show how to transform this problem into the min-min expectation binary 
selection problem with d = 3. (The result for larger d follows trivially by a padding 
argument.) 

We choose Iq — 0,li — 1, and 

Z2-/i-e2^("^-^), (8) 
where 7 is a positive number to be specified later. For 1 <i <n set 

Pi,o= (0,2M) 

and 

Pi,i = {zi,2M-Zi). 

Note that each of these gives a valid set of probabihties; in particular, since < Zi < M 
we have 

1 > e"^^' > e-T(2^-^*) > 0. 

We ask whether we can choose x so that the expected minimum of the selected variables 
is at most 2e~^'^. 
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Now suppose that a selection function x has been specified and let S = {i \ xif) = 



Then using 



min K 



and d^), and letting a 

n 



we have 



^0 + (^1 - ^o) exp(-7 z,^ + {I2 - h) exp -'y(2nM - ^ 
ies \ i&s 

exp (-7 J2 ^i) + e'^("^'-^) exp (—f[2nM - E 

i€S \ i€S / 

(where we have used (||)) 

g-7fT _^ g27(nM-T)g-7(2nAf-a) 



-70" 



+ e' 



7(<T-2T) 



e-7T (^g7(T-.) ^ g7(-T) 



Since + is minimized at x = 0, it is clear that we can achieve an expected minimum 
of 2e~"''^ if and only if we can choose S to make a = T, i.e., if and only if the answer to 
the subset sum problem is yes. 

Two technical points need to be addressed. First, since the magnitudes of Zi, T, and M 
can be exponentially large in the length of the input, one might fear that this transforma- 
tion would produce an image of exponential length. Second, of course, we cannot output 
arbitrary reals in the constructed problem so we must use finite precision and consider 
rounding problems. To resolve these problems, set 7 = l/(2nM); then it is easy to verify 
that no number output is larger than e + 1. Also note that to resolve the constructed 
decision problem it is sufficient to be able to distinguish e~'^'^{e'^ + e"''') from 2e~'^'^. It is 
easy to verify that we need only give a number of bits that is polynomial in the input size 
to achieve this. ■ 

This same construction shows that the min-min expectation subset selection problem is 
also NP-complete: Since all n of the variables Yi^Q constructed have the same distribution, 
picking any n out of the 2n variables is equivalent to picking one from each pair. 

Although the problem is NP-complete, the optimum solution can be easily approxi- 
mated for fixed d, assuming that the k are nonnegative. 

Theorem 2 With nonnegative U and a fixed value of d, the min-min expectation binary 
selection problem admits a fully polynomial time approximation scheme. 

Proof. We use a standard rounding and dynamic programming approach. Assume we 
are given some positive 

e < 1. (9) 
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Let the scale factor in be set at 

7 = f • (10) 
on 

Note that the components of each Pi s are positive real numbers (or +oo when the corre- 
sponding probability is 0). Let Pi^s be a vector in which each real component x of Pi^s is 
rounded down to any integer in the range [x — 2,x]. (The entries that are oo will not be 
rounded. We allow some flexibility in the rounding, rather than rounding to [a;J , to avoid 
having to do extremely precise calculations when x is very close to an integer.) We will 
build a table A indexed by (i-tuples of integers, specifying which vectors are achievable as 
sums J27=i Pi,x{i) some selection function x- 

A minor technical problem is that some components of Z^^Li Pi,x{i) '^^y be infinite. To 
deal with this we note that the maximum value of any finite component of Z^ILi -^j.x(i) is 
bounded by 7~^lnl/p*, where p* is the product of all of the nonzero probabilities given 
in the input. Let T be an integer between 7~^lnl/p* and 7~^lnl/p* + 2. Then we need 
not consider any finite indices of elements of A which exceed T. Recalling that N is the 
number of bits required to specify the input, we see that Inl/p* is O(A^), so 

T = 0{n'^N). (11) 

Formally, we now define the value of y4(s, Li,L2, . . . ,Ld-i), for < s < n and Lj G 
{0,1, ... ,T,T + 1}, to be a boolean which is true if there exists a selection function x 
such that for each j, the jth component of J2i=i Pia(i) equal to 

Lj if Lj <T, and 
oo if Lj = T + 1 . 

This table has 0{nT'^~^) entries, and successive entries can be computed in constant time 
by a standard dynamic programming approach. Hence in view of (|lT]) and then (p!0|) the 
time required to build the table is 0{nT'^~^) = 0{n{nN / eY'^) , which is polynomial in the 
input size and e~^. 

Once the table has been constructed, we simply evaluate the function / given in (^) 
at the tuples that the table tells us are achievable. More formally, let g{L) be a function 
which maps components of L that are equal to T + 1 back to the infinity they represent, 
i.e., the jth component of g{Li, L2, . . . , Ld-i) is 

Lj if Lj < T, and 
00 if Lj = T + 1. 

Then our estimate of the minimum expectation is 



Emin = ^ min f{g{L))- 

L: A(n,L)=true 
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Let Lmin be the value of L at which the minimum is achieved, and let Xmm be a selection 
that achieves this minimum, so 

n 

$Z *.Xmin(i) ~ -^min- 
i=l 

Since P was computed from P by rounding down, and / is a decreasing function, we know 
that -Emin is greater than or equal to the true optimum. Moreover, since each component 
of P was rounded down by at most 2, and we have considered all possibilities for x when 
constructing A, from inspection of / we know that -Emin exceeds the true minimum by a 
ratio of at most 

where first step used (|10D and the last step used @. 

This achieves a ratio of l + e/2 assuming that the computations are exact; it is not hard 
to verify that the arithmetic can be done to only polynomially many places and achieve a 
ratio of 1 + e. ■ 

Again, a similar algorithm gives an approximation scheme for the min-min subset ex- 
pectation selection problem. 



2.2 The max-min problem is in P 

We were surprised to find that, in contrast to the difficulty of the min-min problem, the 
max-min expectation binary selection problem with fixed d can be solved in polynomial 
time. 

Some background is useful. Let Lj, for 1 < i < n, be line segments (considered as point 
sets) . The Minkowski sum of these line segments is 

n 

1=1 

The Minkowski sum of a set of line segments is called a zonotope. For example, see Figure |I], 
where, as suggested by Edelsbrunner 0, the construction of a zonotope is illustrated 
inductively. If the n line segments correspond to the unit vectors for each of n dimensions, 
the Minkowski sum is a hypercube in n dimensions, with 2" vertices. With fixed dimension 
d — 1, the number of vertices in the Minkowski sum of n segments grows much more slowly: 
there are 0{n'^~'^) vertices, and they can be listed in 0{n'^~^ + nlogn) time [Q. 
Going back to our problem, let 



n 

S=[Y, aiPifl + (1 - a,)P,,i I ai e {0, 1}}. 
1=1 
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(1,2) 




(0,0) (1,2) 




(1,-1) 



Figure 1: Illustration of the construction of a zonotope. We start with the interval from 
(0,0) to (0,1). We sweep this by the second interval, from (0,0) to (1,1), to form a rhombus. 
We then sweep this rhombus by the third interval, from (0, 0) to (1, —1), to form a hexagon. 
Note that (1, 0) and (1, 1) are not vertices of this hexagon. 
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We wish to find the maximum value of / at any point in S. Let K be the convex hull of 
these points, so K is the zonotope 



n 

K = {^aiPi,o + (1 I cxi e [0,1]}. 



All vertices of K are in S, but not all points in S are necessarily vertices of K. Since / is 
convex, we know that the maximum value of / over S will be achieved at a vertex of K. 
Thus we can find the maximum in polynomial time by an exhaustive search of the vertices 
of K. 

Solving the max-min expectation subset selection problem (picking k out of n variables) 
is done similarly, except that we need to define 



and 



=1 



n 
i=l 



a, 



G [0, 1] and 0:^ = fcj. 



This K is not in general a zonotope, but rather the intersection of a zonotope with the 
hyperplane given by 

n 

^ai = k. 

i=l 

By results of |T[], we can still list the vertices in polynomial time, and hence produce a 
polynomial time algorithm by exhaustive search. 

A technical point arises here: we have been assuming that we do arithmetic with real 
numbers. To show that the problem is in P, we sketch here a proof that we need only 
carry the computations to 0{N) places of accuracy in order to determine the point at 
which the exact optimum occurs. First let q be the product of all of the denominators of 
fractions appearing in the input, and note that logg = 0{N). It is then easy to see that 
the expectation of the minimum of any subset of the random variables can be expressed 
as i/q for some integer i. Thus to see where the maximum occurs, we only need to be able 
to perform the computations accurately enough to be able to correctly compare quantities 
that differ by at least 1/q. Since the correct answer, even without the assumption that the 
li are positive, is bounded by max(|/j|), it follows that computations need only be done to 
0{N) places of accuracy. 



3 Variable d 



Theorem 3 The problem of approximating the min-min expectation binary selection with 
unrestricted d to within any constant factor r is NP-hard. 
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Proof. We will transform CNF-sat to min-min expectation binary selection in a way 
which gives a large gap between the optimum solutions for satisfiable expressions and 
unsatisfiable expressions. 

Suppose we are given a boolean formula F on variables Xi,...,Xn, with clauses 
Co,Ci, . . . ,Cc-i- We assume without loss of generality that n > 1 and c > 1. For any 
literal L and clause C, let I{L, C) be 1 if L is present in C, and otherwise. Let Atom{p, x) 
denote an atom of weight p and location x. Choose 

and 

t; = r"(c+l)" = p-". (13) 
Construct a min-min expectation binary selection problem M with, for 1 < i < n, variables 

c-l 

with distribution Atom^p", v") + ^ Atom ((1 - p)p', y^-^i^^^c-)^ (14) 

s=0 

and 

c-l _ 

Yifi with distribution Atom(p^ v") + ^ Atom ((1 - p)p^, y''~^(^^fi'^)'^ . (15) 

s=0 

We show that if F is satisfiable then the optimum solution for M is at most 1/r, but if F 
is unsatisfiable the optimum solution is at least 1. 

To begin, fix integers i and s, with < s < c, and consider the probability that a single 
variable Yn is at least f If the literal Xi is not in clause Cg, then from (|l^ the total 
weight of all atoms in Yn with locations at or beyond v'^ is p'^ + Y,'i=l{^ ~ p)p^ = P^- If; 
however, the literal Xj is in clause Cg, then a term (1 — p)p^ disappears from this sum so 
Yii is at least v'^ with probability only p^~^^. In summary. 



Similarly from (|T5D 

Pr{Y^o>v^} = \C ^ (17) 

II A j IS not m Cs- 

Next consider the probability that the minimum of the selected variables is at least f 

i.e.. 



Pr |miny,,^(,) > = f[ Pr > v'} 

i=l 
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Using ( p!6D and (^7^, if the assignment Xi = does not satisfy clause Cs, this product 
is just p'^"'. If, on the other hand, the assignment does satisfy clause Cs, then at least 
one of the terms in the product will be bounded by p^~^^, so the product is at most p^^~^^. 
Summarizing, 



Pr |minFj_^(i) > v' 



IS 



_ psn j£ ^ i^Qgg j^Q^ satisfy Cs 
< if -ji^ satisfies Cs- 



;i8) 



Finally, consider the expectation of the minimum of the selected variables. Note that 
the possible values for the random variables are v~^, 1, f , w^, . . . , v'^~^, so using (H) we have 



min Yi^^^i) 



1 + X: Pr |minr,,^(,) > v'] {v' - v'-'] 

s=0 ^ 



Using ([T8|), we see that if x satisfies all of the clauses, this becomes 



minK- 



i=l 



c-1 



V — V 



s~l\ 



s=0 



c-1 



f-^ + {i-v-^)Y,p 



s=0 



p- + {i-p-)J2p 

s=0 

(by using (13)) 
p" + (1 - p")cp. 



Since n > 1 this is less than 



p + cp = p{c + 1) 



1 



If, on the other hand, x violates some clause, say Cs, again using (|T8D we have 



min Yi^^(^i) 



= v-^ +p'''{l-v-^)v' 

= p" + (1 -p") = 1. 

(by using (13)) 



(19) 



(20) 



Thus approximating the solution to within a factor of r would enable us to distinguish 
between satisfiable and unsatisfiable expressions. ■ 
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Theorem 4 The problem of approximating the min-min expectation subset selection with 
unrestricted d to within any constant factor r is NP-hard. 

Proof sketch. The proof is similar to that of the previous theorem, so we only briefly 
describe the necessary changes. We set 

P=-f ^ TT- (21) 

r(n + c+l) ^ ' 

Now, for 1 < i < n, we construct variables Fj^i with distribution 

c— 1 n— 1 

s=0 s=0 

(22) 

and variables with distribution 

c— 1 n— 1 

Atom(p^+", v''+'') + Atom ((1 - p)p', y'-^^^^'^^^) + ^ Atom ((1 - v''+'-^''^) , 

s=0 s=0 

(23) 

where as usual we define 

1 ii i — s 
otherwise. 

The problem we construct is to choose n of these 2n variables so as to minimize the expected 
value of the minimum of the selected variables; let Z be distributed as the minimum of the 
selected variables. If for some i neither of YiQ and Yn is selected, arguing much as before 
we have E[Z] > 1. Thus in order to get an expectation below 1 we must pick at least one 
of each pair. But since there are n pairs and we must pick exactly n variables, this means 
we can also pick at most one, and hence must pick exactly one, from each pair. Again as 
before, if the selected variables do not correspond to a satisfying assignment, we will have 
E[Z] > 1. On the other hand, if they do correspond to a satisfying assignment, we will 
have 

E [Z] < + (1 - p")(c + n)p<p+(c + n)p = -. 

r 

Thus approximating the solution to within a factor of r would again enable us to distinguish 
between satisfiable and unsatisfiable expressions. ■ 
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